Maharashtra District Geographical Analysis

In [1]:
import pandas as pd
import seaborn as sns
import numpy as np
from pandas import Series, DataFrame
import matplotlib.pyplot as plt
%matplotlib inline 
from datetime import date
In [2]:
!pip install plotly
import plotly.express as px
import plotly.graph_objects as go
Requirement already satisfied: plotly in c:\users\anshika\anaconda3\lib\site-packages (5.24.1)
Requirement already satisfied: tenacity>=6.2.0 in c:\users\anshika\anaconda3\lib\site-packages (from plotly) (8.2.3)
Requirement already satisfied: packaging in c:\users\anshika\anaconda3\lib\site-packages (from plotly) (24.1)
In [11]:
data = pd.read_csv('maharashtra-districts.csv', encoding= 'latin-1')
data.head()
Out[11]:
District Name District Code Administrative Division Headquarters Number of Talukas Area (in sq. km) Population (Census 2011) Population Density (per sq. km) Sex Ratio Literacy Rate (%) Urban Population (%) Formation Date Geographical Coordinates (Latitude and Longitude) Major River(s) Major Crop(s) Key Industries/Economy Tourist Attractions
0 Pune PU Pune Division Pune 14 15643 9429408 603 915 86.15 60.5 1 May 1960 18.52° N, 73.85° E Bhima, Mula, Mutha, Indrayani Sugarcane, Jowar, Bajra, Grapes, Onions IT & ITeS, Automotive, Manufacturing, Educatio... Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,...
1 Satara ST Pune Division Satara 11 10480 3003741 287 988 82.87 21.9 1 May 1960 17.68° N, 74.01° E Krishna, Koyna, Venna Sugarcane, Jowar, Soybean, Turmeric Agriculture, Wind Power, Sugar Factories, Tourism Kaas Plateau, Mahabaleshwar, Panchgani, Thoseg...
2 Sangli SN Pune Division Sangli 10 8572 2822143 329 966 81.48 25.5 1 May 1960 16.85° N, 74.58° E Krishna, Warana Sugarcane, Grapes, Turmeric, Jowar Sugar Production, Turmeric Processing, Textile... Sagareshwar Wildlife Sanctuary, Chandoli Natio...
3 Solapur SO Pune Division Solapur 11 14895 4317756 290 938 77.02 32.4 1 May 1960 17.68° N, 75.90° E Bhima, Sina, Man Jowar, Sugarcane, Pomegranate, Pulses Textiles (Chaddars), Sugar Factories, Beedi In... Siddheshwar Temple, Akkalkot Swami Samarth Mah...
4 Kolhapur KO Pune Division Kolhapur 12 7685 3876001 504 957 81.51 31.7 1 May 1960 16.70° N, 74.24° E Panchganga, Krishna, Dudhganga Sugarcane, Rice, Soybean, Jaggery Sugar Mills, Foundries, Textiles, Kolhapuri Ch... Mahalakshmi Temple, Panhala Fort, Jyotiba Temp...
In [4]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 17 columns):
 #   Column                                             Non-Null Count  Dtype  
---  ------                                             --------------  -----  
 0   District Name                                      36 non-null     object 
 1   District Code                                      36 non-null     object 
 2   Administrative Division                            36 non-null     object 
 3   Headquarters                                       36 non-null     object 
 4   Number of Talukas                                  36 non-null     int64  
 5   Area (in sq. km)                                   36 non-null     int64  
 6   Population (Census 2011)                           36 non-null     int64  
 7   Population Density (per sq. km)                    36 non-null     int64  
 8   Sex Ratio                                          36 non-null     int64  
 9   Literacy Rate (%)                                  36 non-null     float64
 10  Urban Population (%)                               36 non-null     float64
 11  Formation Date                                     36 non-null     object 
 12  Geographical Coordinates (Latitude and Longitude)  36 non-null     object 
 13  Major River(s)                                     36 non-null     object 
 14  Major Crop(s)                                      34 non-null     object 
 15  Key Industries/Economy                             36 non-null     object 
 16  Tourist Attractions                                36 non-null     object 
dtypes: float64(2), int64(5), object(10)
memory usage: 4.9+ KB
In [5]:
data.describe()
Out[5]:
Number of Talukas Area (in sq. km) Population (Census 2011) Population Density (per sq. km) Sex Ratio Literacy Rate (%) Urban Population (%)
count 36.000000 36.000000 3.600000e+01 36.000000 36.000000 36.000000 36.000000
mean 9.944444 8560.666667 3.119993e+06 1436.944444 947.333333 80.864167 33.916667
std 3.970926 4095.966971 2.155514e+06 4569.022377 46.746734 5.596462 22.232814
min 0.000000 157.000000 8.496510e+05 74.000000 832.000000 64.380000 11.000000
25% 7.750000 5614.750000 1.655256e+06 240.750000 929.500000 77.200000 19.450000
50% 9.500000 7701.500000 2.610229e+06 286.000000 944.500000 81.910000 26.350000
75% 14.000000 10880.500000 3.744962e+06 366.500000 959.500000 84.635000 37.550000
max 16.000000 17048.000000 9.429408e+06 20980.000000 1122.000000 89.910000 100.000000

We see that data is mostly normally distributed in Number of Talukas, Sex Ratio, Literacy Rate as well as to some extent in Urban Population. Population Density and Area via intitutively as well as through data show us the skewness. There are some districts which have 100% urban population. 50% of data is below 81.91 % literacy rate. Total there are 36 unique states.

Q.1 Examining which district has maximum no. of Talukas?

In [9]:
data[data['Number of Talukas'] == max(data['Number of Talukas'])]['District Name']
Out[9]:
24      Nanded
33    Yavatmal
Name: District Name, dtype: object

Probably large rural population and large area might be the reason for these districts to have such division of Talukas. However it should not be emphasized enough as there might be other factors in such no. of Talukas.

Q.2 What's the relation between Literacy Rate and Sex Ratio?

In [8]:
sns.lmplot(x='Sex Ratio',y= 'Literacy Rate (%)', data= data)
plt.show()
No description has been provided for this image
In [12]:
data['Sex Ratio'].corr(data['Literacy Rate (%)'])
Out[12]:
-0.1329962817499564

We see that in this interesting case, they roughly have a negative relation, although their Pearson correlation is telling us that there is very weak relationship and there might be other for these two variables in influencing their significance.

In [13]:
from matplotlib.ticker import FuncFormatter
def converter(x,pos):
    return f'{x/1e6:.1f}M'
        
df_sorted= data.sort_values('Population (Census 2011)', ascending= False)
order_list = df_sorted['District Name']


graph= sns.barplot(x='District Name', y='Population (Census 2011)', hue= 'District Name', legend= False,
                   order= order_list, palette='summer', data= data)
graph.yaxis.set_major_formatter(FuncFormatter(converter))
plt.xticks(rotation='vertical')
plt.title('Maharashtra Districts Population as per Census 2011')
plt.show()
No description has been provided for this image

Pune and Mumbai Suburban(not Mumbai City) accounts for most of the population as per census 2011. There is very large difference in terms of different population of the states and this totally aligns with the geography of the Maharashtra, where some parts face droughts and other problems, making it difficult to live accessibly.

Q.3 Calculating the Sex Ratio of different states and how they are performing in terms of their comparsion with mean sex ratio of the maharashtra.

In [15]:
state_avg = round(data['Sex Ratio'].mean(),2)
state_avg
Out[15]:
947.33
In [16]:
sns.barplot(x='District Name', y='Sex Ratio', hue='District Name', legend= False, 
            palette= 'mako', data= data, order= order_list)
plt.axhline(state_avg, color='red', label='State_avg')
plt.ylim(min(data['Sex Ratio']) -100, max(data['Sex Ratio']) +100)
plt.xticks(rotation= 'vertical')
plt.legend()
plt.title('Comparing Sex Ratio of districts with the state average')
plt.show()
No description has been provided for this image

Clearly high population does not result in high sex ratio. Even small districts like Nandurbar performed very well in managing their sex ratio. Mumbai City being the urban city has the lowest sex ratio. This graph shows us different perspectives and parameters should be taken into account while understanding sex ratio of a place. Probably, smaller districts models can be looked upon while initating policies in regards to this topic.

In [54]:
data.head(1)
Out[54]:
District Name District Code Administrative Division Headquarters Number of Talukas Area (in sq. km) Population (Census 2011) Population Density (per sq. km) Sex Ratio Literacy Rate (%) Urban Population (%) Formation Date Major River(s) Major Crop(s) Key Industries/Economy Tourist Attractions Latitude longitude
0 Pune PU Pune Division Pune 14 15643 9429408 603 915 86.15 60.5 1 May 1960 Bhima, Mula, Mutha, Indrayani Sugarcane, Jowar, Bajra, Grapes, Onions IT & ITeS, Automotive, Manufacturing, Educatio... Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,... 18.52 73.85

Q.4 Understanding literacy rate in rural population of the maharashtra.

In [18]:
copied_data = data.copy()
Urban_pop = (copied_data['Urban Population (%)']/100)*copied_data['Population (Census 2011)'] 
copied_data['Rural_pop'] = copied_data['Population (Census 2011)'] - Urban_pop
copied_data.sort_values('Rural_pop', ascending= False).reset_index(drop=True)
copied_data.head(2)         
Out[18]:
District Name District Code Administrative Division Headquarters Number of Talukas Area (in sq. km) Population (Census 2011) Population Density (per sq. km) Sex Ratio Literacy Rate (%) Urban Population (%) Formation Date Geographical Coordinates (Latitude and Longitude) Major River(s) Major Crop(s) Key Industries/Economy Tourist Attractions Rural_pop
0 Pune PU Pune Division Pune 14 15643 9429408 603 915 86.15 60.5 1 May 1960 18.52° N, 73.85° E Bhima, Mula, Mutha, Indrayani Sugarcane, Jowar, Bajra, Grapes, Onions IT & ITeS, Automotive, Manufacturing, Educatio... Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,... 3724616.160
1 Satara ST Pune Division Satara 11 10480 3003741 287 988 82.87 21.9 1 May 1960 17.68° N, 74.01° E Krishna, Koyna, Venna Sugarcane, Jowar, Soybean, Turmeric Agriculture, Wind Power, Sugar Factories, Tourism Kaas Plateau, Mahabaleshwar, Panchgani, Thoseg... 2345921.721
In [73]:
copied_data['Literacy_Category'] = pd.cut(df['Literacy Rate (%)'], bins=3, labels=['Low', 'Medium', 'High'])
plot= sns.kdeplot(x='Rural_pop', hue='Literacy_Category', data=copied_data)
plot.xaxis.set_major_formatter(FuncFormatter(converter))
plt.show()
No description has been provided for this image

We see their is more or less normalization towards a rural population of an average 1 million. Although some districts as an outlier has also seen 4 million of rural population, that too in medium literacy rate. High literacy rate generally implying high density and low density being low literacy rate. This might reflect in terms of the infrastructure building and making social policies more accessible in easy terrain or in high density rural population.

Q.5 Analysing density levels of the Urban Population.

In [19]:
labels = [ 'Low Density', 'Medium Density', 'High Density']
copied_data['binned_data'] = pd.qcut(copied_data['Population Density (per sq. km)'], q=3, labels=labels)
grouped_data = copied_data.groupby('binned_data', observed=True)['Urban Population (%)'].mean()
grouped_data
Out[19]:
binned_data
Low Density       21.341667
Medium Density    25.741667
High Density      54.666667
Name: Urban Population (%), dtype: float64
In [20]:
sns.barplot(x='binned_data', y='Urban Population (%)', data=copied_data, hue= 'binned_data', legend=False, palette= 'viridis', errorbar=None)
plt.title('Urban Population in different density')
plt.show()
No description has been provided for this image

As initutively and with images of Mumbai density stats in mind, we can clearly see high density regions mostly have high urban population. This is directly linked to the level of urbanization and topography.

Q.6 Extracting information about what are the districts in the state producing different Top Crop(s).

In [15]:
data.rename(columns= {'Geographical Coordinates (Latitude and Longitude)': 'Coordinates'}, inplace =True)
In [16]:
data['Latitude'] = data['Coordinates'].str.split(', ').str[0].str.strip('° N')
data['longitude'] = data['Coordinates'].str.split(', ').str[1].str.strip('° E')
data.drop(columns= 'Coordinates', inplace=True)
data.head()
Out[16]:
District Name District Code Administrative Division Headquarters Number of Talukas Area (in sq. km) Population (Census 2011) Population Density (per sq. km) Sex Ratio Literacy Rate (%) Urban Population (%) Formation Date Major River(s) Major Crop(s) Key Industries/Economy Tourist Attractions Latitude longitude
0 Pune PU Pune Division Pune 14 15643 9429408 603 915 86.15 60.5 1 May 1960 Bhima, Mula, Mutha, Indrayani Sugarcane, Jowar, Bajra, Grapes, Onions IT & ITeS, Automotive, Manufacturing, Educatio... Shaniwar Wada, Aga Khan Palace, Sinhagad Fort,... 18.52 73.85
1 Satara ST Pune Division Satara 11 10480 3003741 287 988 82.87 21.9 1 May 1960 Krishna, Koyna, Venna Sugarcane, Jowar, Soybean, Turmeric Agriculture, Wind Power, Sugar Factories, Tourism Kaas Plateau, Mahabaleshwar, Panchgani, Thoseg... 17.68 74.01
2 Sangli SN Pune Division Sangli 10 8572 2822143 329 966 81.48 25.5 1 May 1960 Krishna, Warana Sugarcane, Grapes, Turmeric, Jowar Sugar Production, Turmeric Processing, Textile... Sagareshwar Wildlife Sanctuary, Chandoli Natio... 16.85 74.58
3 Solapur SO Pune Division Solapur 11 14895 4317756 290 938 77.02 32.4 1 May 1960 Bhima, Sina, Man Jowar, Sugarcane, Pomegranate, Pulses Textiles (Chaddars), Sugar Factories, Beedi In... Siddheshwar Temple, Akkalkot Swami Samarth Mah... 17.68 75.90
4 Kolhapur KO Pune Division Kolhapur 12 7685 3876001 504 957 81.51 31.7 1 May 1960 Panchganga, Krishna, Dudhganga Sugarcane, Rice, Soybean, Jaggery Sugar Mills, Foundries, Textiles, Kolhapuri Ch... Mahalakshmi Temple, Panhala Fort, Jyotiba Temp... 16.70 74.24
In [17]:
data['Latitude'] = pd.to_numeric(data['Latitude'])
data['longitude'] = pd.to_numeric(data['longitude'])
In [21]:
data['Major Crop(s)'].unique()
Out[21]:
array(['Sugarcane, Jowar, Bajra, Grapes, Onions',
       'Sugarcane, Jowar, Soybean, Turmeric',
       'Sugarcane, Grapes, Turmeric, Jowar',
       'Jowar, Sugarcane, Pomegranate, Pulses',
       'Sugarcane, Rice, Soybean, Jaggery', nan,
       'Rice, Vegetables, Fruits', 'Rice, Chickoo, Coconut',
       'Rice, Mango, Cashew Nut, Coconut',
       'Alphonso Mango, Rice, Cashew Nut, Coconut',
       'Alphonso Mango, Cashew Nut, Coconut, Kokum, Rice',
       'Grapes, Onions, Pomegranate, Sugarcane',
       'Sugarcane, Jowar, Bajra, Pulses',
       'Cotton, Jowar, Chilli, Groundnut',
       'Banana, Cotton, Jowar, Pulses', 'Jowar, Cotton, Chilli, Maize',
       'Cotton, Maize, Jowar, Bajra',
       'Sweet Orange (Mosambi), Cotton, Jowar',
       'Cotton, Sugarcane, Jowar, Bajra',
       'Soybean, Pulses (Tur), Grapes, Jowar',
       'Jowar, Soybean, Pulses, Sugarcane',
       'Cotton, Jowar, Turmeric, Soybean', 'Jowar, Cotton, Soybean',
       'Cotton, Soybean, Jowar, Banana', 'Oranges, Cotton, Soybean, Rice',
       'Cotton, Soybean, Pulses (Tur)', 'Rice, Jowar, Pulses',
       'Rice, Pulses, Linseed', 'Cotton, Rice, Soybean, Pulses',
       'Rice, Tendu Leaves, Bamboo, Mahua',
       'Cotton, Soybean, Tur (Pigeon Pea), Oranges',
       'Cotton, Jowar, Soybean, Pulses', 'Cotton, Soybean, Pulses',
       'Cotton, Jowar, Maize, Soybean', 'Soybean, Cotton, Tur, Jowar'],
      dtype=object)
In [21]:
ser1= data['Major Crop(s)'].str.split(', ').explode() # Extracting the unique crops and making a dataframe of theirs
expanded_df = data.loc[ser1.index].copy() #Setting the new dataframe index based on the original data index
expanded_df['Major Crop(s)'] = ser1.values
expanded_df['Major Crop(s)'] = expanded_df['Major Crop(s)'].str.strip()
unique_crop= expanded_df['Major Crop(s)'].value_counts()


sns.barplot(x=unique_crop.values, y= unique_crop.index, palette= 'cool_d', hue=unique_crop.values, legend=False)
plt.title('Major Crops Produced in the state')
plt.show()
No description has been provided for this image

Jowar, representing the climate condition as well as our stat of Maharashtra being the top most producer of this crop, can be seen clearly. However we see variety of crops are grown in the state. Some are also very famous to the state like oranges and alphonso mangoes.

In [22]:
Top_4_crops= unique_crop.head(4).index.tolist()
Top_4_crops
Out[22]:
['Jowar', 'Cotton', 'Soybean', 'Rice']

We find the Jowar, Cotton, Soybean and Rice being the top most crops.

In [23]:
import json
with open('Maharashtra.geojson', 'r') as f:
    geojson = json.load(f)
In [24]:
for crop in Top_4_crops:
    df = data[data['Major Crop(s)'].str.contains(crop, case=False, na= False)]
    unique_districts = df['District Name'].unique().tolist()
    z_values = list(range(len(unique_districts)))


# Creating the choropleth map
    fig = go.Figure(go.Choropleth(
        geojson=geojson,
        locations=unique_districts,
        z=z_values,
        colorscale=px.colors.qualitative.Plotly,
        locationmode='geojson-id',
        featureidkey='properties.Dist_Name', 
        marker_opacity=0.5,
        marker_line_width=1,
        marker_line_color='black',
        showscale= False # Hide the color bar
    ))
    
# For map layout
    fig.update_geos(
        visible=True,
        scope='asia',
        center={"lon": 80, "lat": 22},
        projection_scale=10,
        fitbounds='locations'
    )
    fig.update_layout(
        title=f'{crop} producing major districts'
    )
    
    fig.show()
    

Central Maharashtra is the major region in terms of major crop production. Rice cultivation is mostly concentrated in eastern region, sea side specifically. Some districts appear to be most influential like Nanded, Chandrapur, Nagpur etc specializing in different crop production.

Thank You!